Automatic Generation of SIMD DSP Code
نویسندگان
چکیده
Short vector SIMD instructions on recent microprocessors, such as SSE on Pentium III and 4, speed up code but are a major challenge to software developers. This report introduces a compiler that automatically generates C code enhanced with short vector instructions for digital signal processing (DSP) transforms, such as the fast Fourier transform (FFT). The input to the compiler is a concise mathematical description of a DSP algorithm in the language SPL. SPL is used in the Spiral system (http://www.ece.cmu.edu/∼spiral) to generate highly optimized architecture adapted implementations of DSP transforms. Interfacing the newly developed compiler with Spiral yields speed-ups of up to a factor of 2 in several important cases including the FFT and the discrete cosine transform (DCT) used, for instance, in the JPEG compression standard. For the FFT the automatically generated code is competitive with the hand-coded Intel Math Kernel Library (MKL).
منابع مشابه
A HW/SW design methodology for embedded SIMD vector signal processors
SIMD processors have made their way from supercomputers architectures through embedded real-time signal processing. This trend has been driven by signal processing applications with heavy number-crunching requirements like for example base-band processing on mobile devices. Depending on the data dependencies of algorithms and implementation constraints like real-time, power consumption and die ...
متن کاملShort Vector SIMD Code Generation for DSP Algorithms
Short vector SIMD instructions on recent general purpose microprocessors, such as SSE on Pentium III and 4, offer a high potential speed-up but require a very high level of programming expertise. We present a compiler that generates vectorized code for digital signal processing algorithms such as the fast Fourier transform (FFT). The input to our compiler is a mathematical description of the al...
متن کاملPerformance Evaluation of Parallel Simd
A simulator for SIMD type architectures is presented. Starting from an architecture independent algorithm description based on recurrence equations, transformation steps for automatic parallelization, mapping and code generation are outlined. The nal pseudo code program together with architecture dependent parameters and execution time tables, are fed into the simulator in order to gain perform...
متن کاملShort vector code generation and adaptation for DSP algorithms
Most recent general purpose processors feature short vector SIMD instructions, like SSE on Pentium III/4. In this paper we automatically generate platform-adapted short vector code for DSP transform algorithms using SPIRAL. SPIRAL represents and generates fast algorithms as mathematical formulas, and translates them into code. Adaptation is achieved by searching in the space of algorithmic and ...
متن کاملAutomatic SIMD Parallelization of Embedded Applications Based on Pattern Recognition
This paper investigates the potential for automatic mapping of typical embedded applications to architectures with multimedia instruction set extensions. For this purpose a (pattern matching based) code transformation engine is used, which involves a three-step process of matching, condition checking and replacing of the source code. Experiments with DSP and the MPEG2 encoder benchmarks, show t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001